Required Skills for CCA-175

 Data Ingestion:

  • Import data from a MySQL database into HDFS using Sqoop
  • Export data to a MySQL database from HDFS using Apache Sqoop
  • Change the delimiter and file format of data during import using Sqoop
  • Ingest real-time and near-real-time (NRT) streaming data into HDFS using Flume
  • Load data into and out of HDFS using the Hadoop File System (FS) commands

Transform Stage Store:

  • Load data from HDFS and store results back to HDFS using Spark
  • Join disparate datasets together using Spark
  • Calculate aggregate statistics using Spark. Example: average or sum
  • Filter data into a smaller dataset using Spark
  • Write a query that produces ranked or sorted data using Spark

Data Analysis:

  • Read and create a table in the Hive meta-store in a given schema
  • Extract an Avro schema from a set of data files using Avro-tools
  • Create a table in the Hive meta-store using the Avro file format and an external schema file
  • Improve query performance by creating partitioned tables in the Hive meta-store
  • Evolve an Avro schema by changing JSON files

No comments:

Post a Comment